55 research outputs found

    Recovering implicit pitch contours from formants in whispered speech

    Full text link
    Whispered speech is characterised by a noise-like excitation that results in the lack of fundamental frequency. Considering that prosodic phenomena such as intonation are perceived through f0 variation, the perception of whispered prosody is relatively difficult. At the same time, studies have shown that speakers do attempt to produce intonation when whispering and that prosodic variability is being transmitted, suggesting that intonation "survives" in whispered formant structure. In this paper, we aim to estimate the way in which formant contours correlate with an "implicit" pitch contour in whisper, using a machine learning model. We propose a two-step method: using a parallel corpus, we first transform the whispered formants into their phonated equivalents using a denoising autoencoder. We then analyse the formant contours to predict phonated pitch contour variation. We observe that our method is effective in establishing a relationship between whispered and phonated formants and in uncovering implicit pitch contours in whisper.Comment: 5 pages, 3 figures, 2 tables, Accepted at ICPhS 202

    Voicing in Polish: interactions with lexical stress and focus

    Get PDF
    Malisz Z, ƻygis M. Voicing in Polish: interactions with lexical stress and focus. In: 18th International Congress of Phonetic Sciences. Glasgow; In Press.We examine the dynamics of VOT in Polish stops under lexical stress and focus. We elicit real Polish words containing voiced and voiceless stop+/a/ syllables in primary, secondary and unstressed, as well as focus positions. We also correlate VOT with speech rate estimated on the basis of equisyllabic word length. Our results show that the relationships between prosody and VOT are consistent with the status of Polish as a true voicing language

    Speaker-independent neural formant synthesis

    Full text link
    We describe speaker-independent speech synthesis driven by a small set of phonetically meaningful speech parameters such as formant frequencies. The intention is to leverage deep-learning advances to provide a highly realistic signal generator that includes control affordances required for stimulus creation in the speech sciences. Our approach turns input speech parameters into predicted mel-spectrograms, which are rendered into waveforms by a pre-trained neural vocoder. Experiments with WaveNet and HiFi-GAN confirm that the method achieves our goals of accurate control over speech parameters combined with high perceptual audio quality. We also find that the small set of phonetically relevant speech parameters we use is sufficient to allow for speaker-independent synthesis (a.k.a. universal vocoding).Comment: 5 pages, 4 figures. Article accepted at INTERSPEECH 202

    Acoustic-phonetic realisation of Polish syllable prominence: a corpus study.

    Get PDF
    Malisz Z, Wagner P. Acoustic-phonetic realisation of Polish syllable prominence: a corpus study. In: Gibbon D, Hirst D, Campbell N, eds. Rhythm, melody and harmony in speech. Studies in honour of Wiktor Jassem. Speech and Language Technology. Vol 14/15. PoznaƄ, Poland; 2012: 105-114

    Recording and transcription of speech and gesture in the narration of Polish adults and children

    Get PDF
    In the present paper, the experimental procedure, the details of sound and video recording set-up as well as the system for speech and gesture transciption and coding used in the Polish Cartoon Narration Corpus (PCNC) project are described. The audio-visual data come from a cartoon narration task performed by both children and adults. The recordings are transcribed orthographically and phonemically, and labelled for selected phenomena on a number of levels, including gesture, lexicon, prosody, and dialogue acts.In the present paper, the experimental procedure, the details of sound and video recording set-up as well as the system for speech and gesture transciption and coding used in the Polish Cartoon Narration Corpus (PCNC) project are described. The audio-visual data come from a cartoon narration task performed by both children and adults. The recordings are transcribed orthographically and phonemically, and labelled for selected phenomena on a number of levels, including gesture, lexicon, prosody, and dialogue acts

    Micro-timing of backchannels in human-robot interaction

    Get PDF
    Inden B, Malisz Z, Wagner P, Wachsmuth I. Micro-timing of backchannels in human-robot interaction. Presented at the Timing in Human-Robot Interaction: Workshop in Conjunction with the 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI2014), Bielefeld, Germany

    'Ja, mhm, ich verstehe dich' - Oszillator-basiertes Timing multimodaler Feedback-Signale in spontanen Dialogen

    Get PDF
    Wagner P, Inden B, Malisz Z, Wachsmuth I. 'Ja, mhm, ich verstehe dich' - Oszillator-basiertes Timing multimodaler Feedback-Signale in spontanen Dialogen. In: Wolff M, ed. Elektronische Sprachsignalverarbeitung 2012 (Tagungsband ESSV) --- Studientexte zur Sprachkommunikation. Vol 64. Dresden: TUD Press; 2012: 179-187

    'Are you sure you're paying attention?' – 'Uh-huh'. Communicating understanding as a marker of attentiveness

    Get PDF
    Buschmeier H, Malisz Z, Wlodarczak M, Kopp S, Wagner P. 'Are you sure you're paying attention?' – 'Uh-huh'. Communicating understanding as a marker of attentiveness. In: Proceedings of INTERSPEECH 2011. International Speech Communication Association; 2011: 2057-2060.We report on the ïŹrst results of an experiment designed to investigate properties of communicative feedback produced by non-attentive listeners in dialogue. Listeners were found to produce less feedback when distracted by an ancillary task. A decreased number of feedback expressions communicating understanding was a particularly reliable indicator of distractedness. We argue this ïŹnding could be used to facilitate recognition of attentional states in dialogue system users. Index Terms: communicative feedback; dialogue; distraction; engagement; attention; dual tas

    Dimensions of Segmental Variability: Interaction of Prosody and Surprisal in Six Languages

    Get PDF
    Contextual predictability variation affects phonological and phonetic structure. Reduction and expansion of acoustic-phonetic features is also characteristic of prosodic variability. In this study, we assess the impact of surprisal and prosodic structure on phonetic encoding, both independently of each other and in interaction. We model segmental duration, vowel space size and spectral characteristics of vowels and consonants as a function of surprisal as well as of syllable prominence, phrase boundary, and speech rate. Correlates of phonetic encoding density are extracted from a subset of the BonnTempo corpus for six languages: American English, Czech, Finnish, French, German, and Polish. Surprisal is estimated from segmental n-gram language models trained on large text corpora. Our findings are generally compatible with a weak version of Aylett and Turk's Smooth Signal Redundancy hypothesis, suggesting that prosodic structure mediates between the requirements of efficient communication and the speech signal. However, this mediation is not perfect, as we found evidence for additional, direct effects of changes in surprisal on the phonetic structure of utterances. These effects appear to be stable across different speech rates
    • 

    corecore